我试图将一堆页面保存在创建它们的py文件旁边的文件夹中。我在windows上,所以当我试图在文件路径后面加反斜杠时,它会生成一个特殊字符。在
我说的是:from bs4 import BeautifulSoup
import urllib2, urllib
import csv
import requests
from os.path import expanduser
print "yes"
with open('intjpages.csv', 'rb') as csvfile:
pagereader = csv.reader(open("intjpages.csv","rb"))
i=0
for row in pagereader:
print row
agentheader = {'User-Agent': 'Nerd'}
request = urllib2.Request(row[0],headers=agentheader)
url = urllib2.urlopen(request)
soup = BeautifulSoup(url)
for div in soup.findAll('div', {"class" : "side"}):
div.extract()
body = soup.find_all("div", { "class" : "md" })
name = "page" + str(i) + ".html"
path_to_file = "